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This paper deals with the factor modeling for high-dimensional time 
series based on a dimension-reduction viewpoint. Under stationary set- 
tings, the inference is simple in the sense that both the number of fac- 
tors and the factor loadings are estimated in terms of an eigenanalysis 
for a nonnegative definite matrix, and is therefore applicable when the 
dimension of time series is on the order of a few thousands. Asymptotic 
properties of the proposed method are investigated under two settings: 
(i) the sample size goes to infinity while the dimension of time series is 
fixed; and (ii) both the sample size and the dimension of time series go to 
infinity together. In particular, our estimators for zero-eigenvalues enjoy 
faster convergence (or slower divergence) rates, hence making the esti- 
mation for the number of factors easier. In particular, when the sample 
size and the dimension of time series go to infinity together, the estima- 
tors for the eigenvalues are no longer consistent. However, our estimator 
for the number of the factors, which is based on the ratios of the esti- 
mated eigenvalues, still works fine. Furthermore, this estimation shows 
the so-called "blessing of dimensionality" property in the sense that the 
performance of the estimation may improve when the dimension of time 
series increases. A two-step procedure is investigated when the factors 
are of different degrees of strength. Numerical illustration with both sim- 
ulated and real data is also reported. 

1. Introduction. The analysis of multivariate time series data is of in- 
creased interest and importance in the modern information age. Although 
the methods and the associate theory for univariate time series analysis are 
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well developed and understood, the picture for the multivariate cases is less 
complete. In spite of the fact that the conventional univariate time series 
models (such as ARMA) and the associated time-domain and frequency- 
domain methods have been formally extended to multivariate cases, their 
usefulness is often limited. One may face serious issues such as the lack of 
model identification or flat likelihood functions. In fact vector ARMA mod- 
els are seldom used directly in practice. Dimension-reduction via, for ex- 
ample, reduced-rank structure, structural indices, scalar component models 
and canonical correlation analysis is more pertinent in modeling multivariate 
time series data. See [10, 14, 20, 22]. 

In this paper we deal with the factor modeling for multivariate time series 
from a dimension-reduction viewpoint. Differently from the factor analysis 
for independent observations, we search for the factors which drive the serial 
dependence of the original time series. Early attempts in this direction in- 
clude [1, 5, 16, 18, 21, 23, 25]. More recent efforts focus on the inference when 
the dimension of time series is as large as or even greater than the sample 
size; see, for example, [13] and the references within. High-dimensional time 
series data are often encountered nowadays in many fields including finance, 
economics, environmental and medical studies. For example, understanding 
the dynamics of the returns of large numbers of assets is the key for as- 
set pricing, portfolio allocation, and risk management. Panel time series are 
commonplace in studying economic and business phenomena. Environmen- 
tal time series are often of a high dimension due to a large number of indices 
monitored across many different locations. 

Our approach is from a dimension-reduction point of view. The model 
adopted can be traced back at least to that of [18]. We decompose a high- 
dimensional time series into two parts: a dynamic part driven by, hopefully, 
a lower-dimensional factor time series, and a static part which is a vector 
white noise. Since the white noise exhibits no serial correlations, the decom- 
position is unique in the sense that both the number of factors (i.e., the 
dimension of the factor process) and the factor loading space in our model 
are identifiable. Such a conceptually simple decomposition also makes the 
statistical inference easy. Although the setting allows the factor process to 
be nonstationary (see [16]; also Section 2.1 below), we focus on stationary 
models only in this paper: under the stationary condition, the estimation for 
both the number of factors and the factor loadings is carried out in an eigen- 
analysis for a nonnegative definite matrix, and is therefore applicable when 
the dimension of time series is on the order of a few thousands. Further- 
more, the asymptotic properties of the proposed method are investigated 
under two settings: (i) the sample size goes to infinity while the dimension 
of time series is fixed; and (ii) both the sample size and the dimension of 
time series go to infinity together. In particular, our estimators for zero- 
eigenvalues enjoy the faster convergence (or slower divergence) rates, from 
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which the proposed ratio-based estimator for the number of factors benefits. 
In fact when all the factors are strong, the performance of our estimation for 
the number of factors improves when the dimension of time series increases. 
This phenomenon is coined as "blessing of dimensionality." 

The new contributions of this paper include: (i) the ratio-based estimator 
for the number of factors and the associated asymptotic theory which un- 
derpins the "blessing of dimensionality" phenomenon observed in numerical 
experiments, and (ii) a two-step estimation procedure when the factors are 
of different degrees of strength. We focus on the results related to the esti- 
mation for the number of factors in this paper. The results on the estimation 
of the factor loading space under the assumption that the number of factors 
is known are reported in [13]. 

There exists a large body of literature in econometrics and finance on 
factor models for high-dimensional time series. However, most of them are 
based on a different viewpoint, as those models attempt to identify the 
common factors that affect the dynamics of most original component series. 
In analyzing economic and financial phenomena, it is often appealing to 
separate these common factors from the so-called idiosyncratic components: 
each idiosyncratic component may at most affect the dynamics of a few 
original time series. An idiosyncratic series may exhibit serial correlations 
and, therefore, may be a time series itself. This poses technical difficulties in 
both model identification and inference. In fact the rigorous definition of the 
common factors and the idiosyncratic components can only be established 
asymptotically when the dimension of time series tends to infinity; see [6, 8]. 
Hence those factor models are only asymptotically identifiable. According to 
the definition adopted in this paper, both "the common factors" and those 
serially correlated idiosyncratic components will be identified as factors. This 
is not ideal for the applications with the purpose to identify those common 
factors. However, this makes the tasks of model identification and inference 
much simpler. 

The rest of the paper is organized as follows. The model and the estima- 
tion methods are introduced in Section 2. The sampling properties of the 
estimation methods are investigated in Section 3. Simulation results are in- 
serted whenever appropriate to illustrate the various asymptotic properties 
of the methods. Section 4 deals with the cases when different factors are of 
different strength, for which a two-step estimation procedure is preferred. 
The analysis of two real data sets is reported in Section 5. All mathematical 
proofs are relegated to the Appendix. 

2. Models and estimation. 

2.1. Models. If we are interested in the linear dynamic structure of yt 
only, conceptually we may think that yt consists of two parts: a static part 
(i.e., a white noise), and a dynamic component driven by, hopefully, a low- 
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dimensional process. This leads to the decomposition: 
(2.1) y i = Ax t + £ i , 



where xt is an r x 1 latent process with (unknown) r < p, A is a p x r 
unknown constant matrix, and Et ~ WN(/i £ , S e ) is a vector white-noise pro- 
cess. When r is much smaller than p, we achieve an effective dimension- 
reduction, as then the serial dependence of yt is driven by that of a much 
lower-dimensional process x$. We call xt a factor process. The setting (2.1) 
may be traced back at least to [18]; see also its further development in 
dealing with cointegrated factors in [19]. 

Since none of the elements on the RHS of (2.1) are observable, we have 
to characterize them further to make them identifiable. First we assume 
that no linear combinations of x^ are white noise, as any such compo- 
nents can be absorbed into St [see condition (CI) below]. We also assume 
that the rank of A is r. Otherwise (2.1) may be expressed equivalently in 
terms of a lower-dimensional factor. Furthermore, since (2.1) is unchanged 
if we replace (A,xj) by (AH,H~ 1 x^) for any invertible r x r matrix H, we 
may assume that the columns of A = (ai, . . . ,a r ) are orthonormal, that is, 
A' A = I r , where I r denotes the r x r identity matrix. Note that even with 
this constraint, A and xj are not uniquely determined in (2.1), as the afore- 
mentioned replacement is still applicable for any orthogonal H. However, 
the factor loading space, that is, the r-dimensional linear space spanned by 
the columns of A, denoted by A4(A), is uniquely defined. 

We summarize into condition (CI) all the assumptions introduced so far: 

(CI) In model (2.1), St ~ WN(/x e , X! £ ). If c'X^ is white noise for a constant 
c G 3l p , then c' Cov(X t+i ! c , et) = for any nonzero integers k. Furthermore 
AA = I r . 

The key for the inference for model (2.1) is to determine the number of 
factors r and to estimate the pxr factor loading matrix A, or more precisely 
the factor loading space A4(A). Once we have obtained an estimator, say, A, 
a natural estimator for the factor process is 



The dynamic modeling for yt is achieved via such a modeling for x^ and 
the relationship yt = A5q. A parsimonious fitting for 5q may be obtained 
by rotating % appropriately [27]. Such a rotation is equivalent to replac- 
ing A by AH for an appropriate r x r orthogonal matrix H. Note that 
.M(A) = A4(AH), and the residuals (2.3) are unchanged with such a re- 
placement. 




% = A'y t 



e t = (I d - AA')y t . 
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2.2. Estimation for A and r . An innovation expansion algorithm is pro- 
posed in [16] for estimating A based on solving a sequence of nonlinear 
optimization problems with at most p variables. Although the algorithm 
is feasible for small or moderate p only, it can handle the situations when 
the factor process is nonstationary. We outline the key idea below, as 
our computationally more efficient estimation method for stationary cases 
is based on the same principle. 

Our goal is to estimate M(A), or, equivalently, its orthogonal complement 
M(B), where B = (bi,...,b p — r ) is a p x (p — r ) matrix for which (A,B) 
forms a p x p orthogonal matrix, that is, B'A = and B'B = I p _ r [see 
also (CI)]. It follows from (2.1) that 

(2.4) B'y t = B'e t , 

implying that for any 1 < j < p — r, {b^y^i = 0, ±1, . . .} is a white-noise 
process. Hence, we may search for mutually orthogonal directions bi,b2, . . . 
one by one such that the projection of yt on each of those directions is a white 
noise. We stop the search when such a direction is no longer available, and 
take p — k as the estimated value of r, where k is the number of directions 
obtained in the search. This is essentially how [16] accomplish the estimation. 
It is irrelevant in the above derivation if is stationary or not. 

However, a much simpler method is available when x^, therefore also y^, 
is stationary: 

(C2) xt is weakly stationary, and Cov(xj, St+k) = for any k > 0. 

In most factor modeling literature, xt and e s are assumed to be uncorrelated 
for any t and s. Condition (C2) requires only that the future white- noise 
components are uncorrelated with the factors up to the present. This en- 
larges the model capacity substantially. Put 

Hy(k) = Cov(y i+fc ,y 4 ), ^x(k) = Cov(x i+fc , x t ), 
^xe(k) = Cov(x t+fc ,e t ). 
It follows from (2.1) and (C2) that 

(2.5) Vy(k) = AYl x (k)A' + A'E XE (k), k > 1. 
For a prescribed integer fco > 1, define 

k 

(2.6) M = £s ( (fc)S y (*)'. 

k=l 

Then M is a p x p nonnegative matrix. It follows from (2.5) that MB = 0, 
that is, the columns of B are the eigenvectors of M corresponding to zero- 
eigenvalues. Hence conditions (CI) and (C2) imply: 

The factor loading space A4(A) is spanned by the eigenvectors of 
M corresponding to its nonzero eigenvalues, and the number of 
the nonzero eigenvalues is r. 
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We take the sum in the definition of M to accumulate the information from 
different time lags. This is useful especially when the sample size n is small. 
We use the nonnegative definite matrix Jj y (k)Jjy(k)' [instead of S J/ (/c)] to 
avoid the cancellation of the information from different lags. This is guaran- 
teed by the fact that for any matrix C, MC = if and only if Y, y (k)'C = 
for all 1 < k < ko- We tend to use small ko, as the autocorrelation is often at 
its strongest at the small time lags. On the other hand, adding more terms 
will not alter the value of r, although the estimation for S y (A;) with large k 
is less accurate. The simulation results reported in [13] also confirm that the 
estimation for A and r, defined below, is not sensitive to the choice of ko- 
To estimate Ai(A), we only need to perform an eigenanalysis on 

ko 

(2.7) M = ^2%(k)%(k)>, 

k=l 

where ^ y {k) denotes the sample covariance matrix of yj at lag k. Then 
the estimator r for the number of factors is defined in (2.8) below. The 
columns of the estimated factor loading matrix A are the r orthonormal 
eigenvectors of M corresponding to its r largest eigenvalues. Note that the 
estimator A is essentially the same as that defined in Section 2.4 of [13], 
although a canonical form of the model is used there in order to define the 
factor loading matrix uniquely. 

Due to the random fluctuation in a finite sample, the estimates for the 
zero-eigenvalues of M are unlikely to be exactly. A common practice is to 
plot all the estimated eigenvalues in a descending order, and look for a cut- 
off value r such that the (r + l)th largest eigenvalue is substantially smaller 
than the r largest eigenvalues. This is effectively an eyeball-test. The ratio- 
based estimator defined below may be viewed as an enhanced eyeball-test, 
based on the same idea as [28]. In fact this ratio-based estimator benefits 
from the faster convergence rates of the estimators for the zero-eigenvalues; 
see Proposition 1 in Section 3.1 below, and also Theorems 1 and 2 in Sec- 
tion 3.2 below. The other available methods for determining r include the 
information criteria approaches of [2, 3] and [9], and the bootstrap approach 
of [4], though the settings considered in those papers are different. 

A ratio-based estimator for r. We define an estimator for the number of 
factors r as follows: 

(2.8) f = argminAj+i/Aj, 

l<i<R 

where Ai > • • • > \ p are the eigenvalues of M, and r < R < p is a constant. 

In practice we may use, for example, R = p/2. We cannot extend the 
search up to p, as the minimum eigenvalue of M is likely to be practically 0, 
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especially when n is small and p is large. It is worthy noting that when p 
and n are on the same order, the estimators for eigenvalues are no longer 
consistent. However, the ratio-based estimator (2.8) still works well. See 
Theorem 2 (hi) below. 

The above estimation methods for A and r can be extended to those non- 
stationary time series for which a generalized lag- A; autocovariance matrix 
is well defined (see, e.g., [19]). In fact, the methods are still applicable when 
the weak limit of the generalized lag-/c autocovariance matrix 

n-l 

S y (k)=n- a Y,(yt+k-y)(yt-y)' 

exists for 1 < k < ko, where a > 1 is a constant. Further developments on 
those lines will be reported elsewhere. For the factor modeling for high- 
dimensional volatility processes based on a similar idea, we refer to [15, 26]. 

3. Estimation properties. Conventional asymptotic properties are estab- 
lished under the setting that the sample size n tends to oo and everything 
else remains fixed. Modern time series analysis encounters the situation when 
the number of time series p is as large as, or even larger than, the sample 
size n. Then the asymptotic properties established under the setting when 
both n and p tend to oo are more relevant. We deal with these two settings 
in Section 3.1 and Sections 3.2-3.4 separately. 

3.1. Asymptotics when n — > oo and p fixed. We first consider the asymp- 
totic properties under the assumption that n — > oo and p is fixed. These 
properties reflect the behavior of our estimation method in the cases when n 
is large and p is small. We introduce some regularity conditions first. Let 
Ai, . . . , \ p be the eigenvalues of the matrix M: 

(C3) yt is strictly stationary and ^-mixing with the mixing coefficients ip(-) 
satisfying the condition that Y2t>i ^(t) 1 ^ 2 < oo. Furthermore, E{\yt | 4 } < oo 
element- wisely. 

(C4) Ai > • • • > A r > = A r+ i = • • • = A p . 

Section 2.6 of [7] gives a compact survey on the mixing properties of time 
series. The use of the -^-mixing condition in (C3) is for technical conve- 
nience. Note that M is a nonnegative definite matrix. All its eigenvalues are 
nonnegative. Condition (C4) assumes that its r nonzero eigenvalues are dis- 
tinct from each other. While this condition is not essential, it substantially 
simplifies the presentation of the convergence properties in Proposition 1 be- 
low. Let 7j be a unit eigenvector of M corresponding to the eigenvalue Aj . 

We denote by (Ai,-y 1 ), . . . , (A p ,7 p ) the p pairs of eigenvalue and eigenvector 

of matrix M: the eigenvalues Aj are arranged in descending order, and the 
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eigenvectors 7j are orthonormal. Furthermore, it may go without explicit 
statement that 7,- may be replaced by — 7^ in order to match the direction 
of 7^ for 1 < j < r. 

Proposition 1. Let conditions (C1)-(C4) hold. Then as n— > 00 (but p 
fixed), it holds that: 

(i) \\j -X 3 -\ = P (n~ l l 2 ) and \\% - jA\ = Op(n~ 1 /2) /or j = i,..., r , 

and 

(ii) Aj = Op(n~ 1 ) forj = r + l,...,p. 

The proof of the above proposition is in principle the same as that of 
Theorem 1 in [4], and is therefore omitted. 

3.2. Asymptotics when n —¥ 00, p —¥ 00 and r fixed. To highlight the rad- 
ically different behavior when p diverges together with n, we first conduct 
some simulations: we set in model (2.1) r = 1, A' = (1, . . . , 1), St are indepen- 
dent iV(0,I p ), and xj = x% is an AR(1) process defined by xt+i = 0.7x t + et- 
We set the sample size n = 50, 100, 200, 400, 800, 1600 and 3200, and the di- 
mension fixed at half the sample size, that is, p = n/2. Let M be defined as 
in (2.6) with ko = 1. For each setting, we draw 200 samples. The boxplots of 
the errors Aj — Aj, i = 1, . . . , 6, are depicted in Figure 1. Note that A^ = for 
i > 2, since r = 1. The figure shows that those estimation errors do not con- 
verge to 0. In fact those errors seem to increase when n (and also p = n/2) 
increases. Therefore the classical asymptotic theory (i.e., n — > 00 and p fixed) 
such as Proposition 1 above is irrelevant when p increases together with n. In 
spite of the lack of consistency in estimating the eigenvalues, the ratio-based 
estimator for the number of factors r (=1) defined in (2.8) works perfectly 
fine for this example, as shown in Figure 2. In fact it is always the case that 
r = 1 in all our experiments even when the sample size is as small as n = 50; 
see Figure 2. 

To develop the relevant asymptotic theory, we introduce some notation 
first. For any matrix G, let ||G|| be the square root of the maximum eigen- 
value of GG', and ||G|| m i n be the square root of the smallest nonzero eigen- 
value of GG'. We write a x b if a = 0(b) and b = 0(a). Recall T, x (k) = 
Cov(x t+ fc,Xi) and T, xt (k) = Cov(x t +k, £*)■ Some regularity conditions are 
now in order: 

(C5) For a constant 5 G [0, 1], it holds that ||Sa;(fc)|| xp 14 x WS x (k)\\ miu . 
(C6) For fc = 0,l,...,*b, ||E xe (fc)|| =o(p 1 ~ 5 ). 

Remark 1 . (i) Condition (C5) looks unnatural. It is derived from more 
natural conditions (3.1) and (3.2) below coupled with the standardization 
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n=50 
p=25 



n=100 
p=50 



n=200 
p=100 



n=400 
p=200 



n=800 
p=400 



n=1600 
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n=3200 
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Estimation errors for the 2nd largest eigenvalue 




n=800 
p=400 



n=1600 
p=800 



n=3200 
p=1 600 



Estimation errors for the 3rd largest eigenvalue 




n=800 
p=400 



n=1600 
p=800 



n=3200 
p=1 600 



Estimation errors for the 4th largest eigenvalue 




n=200 
p=100 



n=400 
p=200 



n=800 
p=400 



n=1600 
p=800 



n=3200 
p=1600 



Estimation errors for the 5th largest eigenvalue 




Estimation errors for the 6th largest eigenvalue 




[G. 1. Boxplots for the errors in estimating the first six eigenvalues of M with r = 1 
id all the factor loading coefficients being 1. 
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n=100, p=50 
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n=200, p=100 
5 p 3 3 3 H 5 



=1 i=2 i=3 i=4 i=5 i=6 i=7 

n=800, p=400 



i=3 i=4 i=5 i=6 i=7 

n=1600, p=800 



i=3 i=4 i=5 

n=3200, p= 



i=6 i=7 

1600 



i=1 i=2 i=3 i=4 i=5 i=6 i=7 



i=1 i=2 i=3 i=4 i=5 



Fig. 2. Boxplots for the ratios Ai+i/A;, u>ii/i r — 1 anrf a// t/ie factor loading coefficients 
1. 



A'A = I r . Since A = (ai, . . . , a r ) is p x r and p — > oo now, it is natural to 
let the norm of each column of A, before standardizing to A'A = I r , tend 
to oo as well. To this end, we assume that 

(3.1) Ha.-fxp 1 -^, j = l,...,r, 

where 5j € [0, 1] are constants. We take 6j as a measure of the strength of the 
factor xtj. We call xtj a strong factor when 5j = 0, and a weak factor when 
5j > 0. Since r is fixed, it is also reasonable to assume that for k = 0, 1, . . . , ko, 

(3.2) |E x (fc)|^0. 

Then condition (C5) is entailed by the standardization A'A = I r under 
conditions (3.2) and (3.1) with Sj = 5 for all j. 

(ii) The condition assumed on H xe (k) in (C6) requires that the correlation 
between Xt+k (k > 0) and is not too strong. In fact under a natural 



FACTOR MODELING FOR HIGH-DIMENSIONAL TIME SERIES 



11 



condition that S xe (/c) = 0(1) element-wisely, it is implied by (3.1) and the 
standardization A' A = I r [hence now Xjt = P {p^l 2 ) as a result of such 
standardization] that \\Yl xe (k)\\ = 0(p 1 ~ s ^ 2 ). 

Now we deal with the convergence rates of the estimated eigenvalues, and 
establish the results in the same spirit as Proposition 1. Of course the con- 
vergence (or divergence) rate for each estimator Aj is slower, as the number 
of estimated parameters goes to infinity now. 

Theorem 1. Let conditions (C1)-(C6) hold and h n = p & n~ 1 / 2 — s> 0. Then 
as n — > oo and p — > oo, it holds that: 

(i) \X i -X i \ = Op(p 2 - s n- 1 / 2 ) fori = l,...,r, and 

(ii) Xj = P {p 2 n~ l ) for j = r + 1, . . . ,p. 

Corollary 1. Under the condition of Theorem 1, it holds that 
Aj+i/Aj x 1 for j = 1, . . . ,r — 1 and X T+ \/X r = Op(p jn) — > 0. 

The proofs of Theorem 1 and Corollary 1 are presented in the Appendix. 
Obviously when p is fixed, Theorem 1 formally reduces to Proposition 1. 
Some remarks are now in order. 

Remark 2. (i) Corollary 1 implies that the plot of ratios Aj+i/Aj, 
i = 1,2, ... , will drop sharply at i = r. This provides a partial theoretical 
underpinning for the estimator r defined in (2.8). Especially when all fac- 
tors are strong (i.e., 5 = 0), X r+ i/X r = Op{n~ l ). This convergence rate is 
independent of p, suggesting that the estimation for r may not suffer as p 
increases. In fact when all the factors are strong, the estimation for r may 
improve as p increases. See Remark 3(iv) in Section 3.4 below. 

(ii) Unfortunately, we are unable to derive an explicit asymptotic ex- 
pression for the ratios Aj+i/Aj with i > r, although we make the following 
conjecture: 

(3.3) Aj+i/AjAl, j = (k + l)r + l,...,(k + l)r + K, 

where ko is the number of lags used in defining matrix M in (2.6), and 
K > 1 is any fixed integer. See also Figure 2. Further simulation results, not 
reported explicitly, also conform with (3.3). This conjecture arises from the 
following observation: for j > (ko + l)r, the jth largest eigenvalue of M is 
predominately contributed by the term Ylk°=i ^e(^)^£ W which has a clus- 
ter of largest eigenvalues on the order of p 2 /n 2 , where 5] £ (fc) is the sample 
lag-A; autocovariance matrix for St- See also Theorem 2(iii) in Section 3.4 
below. 
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Table 1 

Relative frequency estimates for P(f~ — r) in the simulation with 200 replications 







n 


50 


100 


200 


400 


800 


1600 


3200 


5 = 





p = Q.2n 


0.165 


0.680 


0.940 


0.995 


1 


1 


1 






p = 0.5n 


0.410 


0.800 


0.980 


1 


1 


1 


1 






p = 0.8n 


0.560 


0.815 


0.990 


1 


1 


1 


1 






p= 1.2n 


0.590 


0.820 


0.990 


1 


1 


1 


1 


8 = 


0.5 


p = 0.2n 


0.075 


0.155 


0.270 


0.570 


0.980 


1 


1 






p = 0.5n 


0.090 


0.285 


0.285 


0.820 


0.960 


1 


1 






p = 0.8n 


0.060 


0.180 


0.490 


0.745 


0.970 


1 


1 






p = 1.2n 


0.090 


0.180 


0.310 


0.760 


0.915 


1 


1 



(iii) The errors in estimating eigenvalues are on the order of p 2 5 n l l 2 or 
p 2 n _1 , and both do not necessarily converge to 0. However, since 

-J^— = Op^n- 1 ' 2 ) = P (h n ) = o P (l) 
|Aj — Aj| 

for any 1 < i < r and r < j <p, 

the estimation errors for the zero-eigenvalues is asymptotically of an order 
of magnitude smaller than those for the nonzero-eigenvalues. 

3.3. Simulation. To illustrate the asymptotic properties in Section 3.2 
above, we report some simulation results. We set in model (2.1) r = 3, 
n = 50, 100, 200, 400,800,1600 and 3200, and p = 0.2n, 0.5n, 0.8re and 1.2n. 
All the p x r elements of A are generated independently from the uniform 
distribution on the interval [—1,1] first, and we then divide each of them 
by p^l 2 to make all three factors of the strength 5; see (3.1). We generate 
factor X£ from a 3 x 1 vector-AR(l) process with independent iV(0, 1) inno- 
vations and the diagonal autoregressive coefficient matrix with 0.6, —0.5 and 
0.3 as the main diagonal elements. We let et in (2.1) consist of independent 
N(0, 1) components and they are also independent across t. We set ko = 1 
in (2.6) and (2.7). For each setting, we replicate the simulation 200 times. 

Table 1 reports the relative frequency estimates for the probability P(r = 
r) = P(r = 3) with 5 = and 0.5. The estimation performs better when 
the factors are stronger. Even when the factors are weak (i.e., 6 = 0.5), the 
estimation for r is very accurate for n > 800. When the factors are strong 
(i.e., 5 = 0), we observe a phenomenon coined as "blessing of dimensionality" 
in the sense that the estimation for r improves as the dimension p increases. 
For example, when the sample size n = 100, the relative frequencies for 
?=r are, respectively, 0.68, 0.8, 0.815 and 0.82 for p = 20,50, 80 and 120. 
The improvement is due to the increased information on r from the added 
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Fig. 3. Boxplots for the ratios Xi+i/Xi with two strong factors (8 = 0) and one weak 
factor (5 = 0.5 ) and r = 3, p= n/2. 



components of y~t when p increases. When 5 = 0.5, the columns of A are 
p- vectors with the norm p - 25 [see (3.1)]. Hence we may think that many 
elements of A are now effectively 0. The increase of the information on the 
factors is coupled with the increase of "noise" when p increases. Indeed, 
Table 1 shows that when factors are weak as 5 = 0.5, the estimation for r 
does not necessarily improve as p increases. 

We also experiment with a setting with two strong factors (with 5 = 0) 
and one weak factor (with 5 = 0.5). Then the ratio-based estimator r tends 
to take two values, picking up the two strong factors only. However Fig- 
ure 3 indicates that the information on the third weak factor is not lost. In 
fact, Aj+i/Aj tends to take the second smallest value at i = 3. In this case 
a two-step estimation procedure should be employed in order to identify the 
number of factors correctly; see Section 4 below. 
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3.4. Improved rates for the estimated eigenvalues. The rates in Theo- 
rem 1 can be further improved, if we are prepared to entertain some addi- 
tional conditions on St in model (2.1). Such an improvement is relevant as 
the condition that h n = p s n~ 1 / 2 — > 0, required in Theorem 1, is sometimes 
unnecessary. For example, in Table 1, the ratio-based estimator r works 
perfectly well when 5 = 0.5 and n is sufficiently large (e.g., n > 800), even 
though h n = (p/n) 1 / 2 -ft 0. Furthermore, in relation to the phenomenon of 
"blessing of dimensionality" exhibited in Table 1, Theorem 1 fails to reflect 
the possible improvement on the estimation for r when p increases; see also 
Remark 2(i). We first introduce some additional conditions on e^: 

(C7) Let Sjt denote the jth component of £t- Then Ejt are independent 
for different t and j, and have mean and common variance a 2 < oo. 
(C8) The distribution of each Ejt is symmetric. Furthermore, E(e 2 ^ +l ) = 

0, and E(e 2 t ) < (rk) k for all 1 < j <p and t,k>l, where r > is a constant 
independent of j,t,k. 

(C9) All the eigenvalues of X! e are uniformly bounded as p — > oo. 

The moment condition E(e 2 ^) < (rk) k in (C8) implies that Ejt are sub- 
Gaussian. Condition (C9) imposes some constraint on the correlations among 
the components of St ■ When all components of {et} are independent N(0, a 2 ), 
(C7)-(C9) hold. See also conditions (i')-(iv') of [17]. 

Theorem 2. Let conditions (C1)-(C8) hold, £ n = p s / 2 n~ 1 / 2 -> and 
n = 0(p). Then as p,n—> oo, the following assertions hold: 

(i) \\j -X j \ = Op(p 2 - 3< V2 n -i/2) for j = 1, . . . ,r, 

(ii) Xj = Op(p 2 - 5 n- x ) forj = r + l,...,(k + l)r, 
(hi) Xj = P (p 2 n- 2 ) for j = (k + l)r + 1, . . . ,p. 

If in addition (C9) holds, the rate in (ii) above can be further improved 

to 

(3.4) Xj = Opip^n' 1 ' 2 ), j = r + 1, . . . , (k + l)r. 

Corollary 2. Under the conditions of Theorem 2, it holds that 
Aj+i/Aj-xl, j = l,...,r-l, and X r+ i/X r = P {p 5 n~ 1 ). 

If in addition (C9) also holds, A r+ i/A r = Op(p^^ 2 n _1 ^). 

The proofs of Theorem 2 and Corollary 1 are given in the Appendix. 

Remark 3. (i) By comparing with Theorem 1, the error rate for nonze- 
ro A,- in Theorem 2 is improved by a factor p~^l 2 , the error rate for zero- 
eigenvalues is by a factor p~ 5 at least. However, those estimators themselves 
may still diverge, as illustrated in Figure 1. 
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(ii) Theorem 2(iii) is an interesting consequence of the random matrix 
theory. The key message here is as follows: for the eigenvalues corresponding 
purely to the matrix X^itLi S E (i;)S E (fe)', their magnitudes adjusted for p 2 ~ 2S 
converge at a super-fast rate. The matrix ^fcli S e (fe)S 6 (fey is a part of M 
in (2.7), where S e (fc) is the sample lag-A; autocovariance matrix for {et}. In 
particular, when all the factors are strong (i.e., 5 = 0), the convergence rate 
is n~ 2 . Such a super convergence rate never occurs when p is fixed. 

(iii) Condition i n — > is mild, and is weaker than condition h n — > re- 
quired in Theorem 1. For example, when pxn, this condition is implied by 
the condition 5 € [0, 1). 

(iv) With additional condition (C9), A r+ i/A r = Op(p~ l / 2 n~ l / n ) when all 
factors are strong. This shows that the speed at which A r +i/A r converges to 
increases when p increases. This property gives a theoretical explanation 
why the identification for r becomes easier for larger p when all factors are 
strong (i.e., 5 = 0). See Table 1. 

4. Two-step estimation. In this section, we outline a two-step estimation 
procedure. We will show that it is superior than the one-step procedure 
presented in Section 2.2 for the determination of the number of factors as 
well as for the estimation of the factor loading matrices in the presence of the 
factors with different degrees of strength. A similar procedure is described 
in [19] to improve the estimation for factor loading matrices in the presence 
of small eigenvalues, although they gave no theoretical underpinning on why 
and when such a procedure is advantageous. 

Consider model (2.1) with r\ strong factors with strength 5\ = and r 2 
weak factors with strength 52 > 0, where r\ + r 2 = r. Now (2.1) may be 
written as 

(4.1) y t = Ax t + e t = Aix w + A 2 x 2t + e t , 

where xj = (x' lt x 2i )', A = (Ai A 2 ) with A' A = I r , xif consists of r\ strong 
factors, and x 2 £ consists of r 2 weak factors. Like model (2.1) in Section 2.1, 
A = (Ai, A 2 ) and x< = (xi t ,x 2t ) are not uniquely defined, but only A4(A) 
is. Hereafter A = (Ai, A 2 ) corresponds to a suitably rotated version of the 
original A in model (4.1), where now A contains all the eigenvectors of M 
corresponding to its nonzero eigenvalues. Refer to (2.6) for the definition 
of M. 

To present the two-step estimation procedure clearly, let us assume that 
we know r± and r 2 first. Using the method in Section 2.2, we first obtain the 
estimator A = (Ai, A 2 ) for the factor loading matrix A = (Ai, A 2 ), where 
the columns of Ai are the r\ orthonormal eigenvectors of M corresponding 
to its r\ largest eigenvalues. In practice we may identify r\ using, for exam- 
ple, the ratio-based estimation method (2.8); see Figure 3. We carry out the 
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second-step estimation as follows. Let 

(4.2) yt=yt-AiAiy t 

for all t. We perform the same estimation for data {y£ } now, and obtain the 
pxr2 estimated factor loading matrix A2 for the ri weak factors. Combining 
the two estimators together, we obtain the final estimator for A as 

(4.3) A = (Ai,A 2 ). 

Theorem^ 3 belowjpresents the convergence rates_for both _the one-step 
estimator A = (Ai, A2) and the two-step estimator A = (Ai, A2). It shows 
that A converges to A at a faster rate than A. The results are established 
with known r\ and r%. In practice we estimate r\ and T2 using the ratio- 
based estimators. See also Theorem 4 below. We introduce some regular- 
ity conditions first. Let Xi2(&) = Cov(xi it+ fc,X2t), S 2 i(fc) = Cov^^+fc, xit), 
Si(fe) =Cov(x i)t+fe ,x it ) and S ie (fe) = Cov(x i>t+fc , e t ) for 2=1,2: 

(C5)' For i = l,2, l<k<k , ||Ei(fc)|| xp 1 ^ x ^(AOlUn, ||E 2 i(fc)l| ~ 

II E2I (fe) ||min 

and ||Si 2 (A:)|| =0(p 1 -* 2 / 2 ). 
(C6)' Cov(x t ,e s ) = for any i, s. 

The condition on Ej(fc) in (C5)' is an analogue to condition (C5). See Re- 
mark l(i) in Section 3.2 for the background of those conditions. The order of 
||^2i(fe)||min will be specified in the theorems below. The order of ||Si 2 (A;)|| 
is not restrictive, since p l ~ 52 / 2 is the largest possible order when 5\ = 0. See 
also the discussion in Remark l(ii). Condition (C6)' replaces condition (C6). 
Here we impose a strong condition Ej e (A;) = to highlight the benefits of 
the two-step estimation procedure. See Remark 4(iii) below. Put 

W, = (Si(l), . . . , Vifa)), W 2 i = (S 2 i(l), • • • , S 2 i(fc )). 

Theorem 3. Let conditions (G1)-(C4), (C5)', (C6)', (C7) and (C8) 
hold. Let n = 0(p) and K n = p &2 ^ 2 n I 2 — > 0, as n — » 00. Then it holds that 

||Ai - Ai|| = Op{n~ l l 2 ), ||A 2 - A 2 || = P ( Kn ) = ||A - A||. 
Furthermore, 

||A 2 - A 2 || = P {u n ) = ||A - A||, 

if, in addition, v n — >■ and ■p c ^ 2 n ~ 1 l 2 _ ). ; where v n and c are defined as 
follows: 

(p s *K n , 2/||W2i|| min = o(p 1 - <52 ) (c=l); 

i/ n = J p (2c-i)* aKn) ^ ||W2i|| min xp 1 "* /or 1/2 < c < 1, and 
I l|WiW 21 || < gllWilUinllWaiH for <<?< 1. 

Note that Kn/un — > 0. Theorem 3 indicates that between Ai and A2, the 
latter is more difficult to estimate, and the convergence rate of an estimator 
for A is determined by the rate for A 2 . This is intuitively understandable 
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as the coefficient vectors for weak factors effectively contain many zero- 
components; see (3.1). Therefore a nontrivial proportion of the components 
of 

Yt may contain little information on weak factors. When ||W2i|| m i n x 
||W2|| is dominated by ||W2i|| m i n - The condition ||WiW 21 || < 
gllWill m i n 1 1 W21 1 1 for < q < 1 is imposed to control the behavior of the 
(ri + l)th to the rth largest eigenvalues of M under this situation. If this is 
not valid, those eigenvalues can become very small and give a bad estimator 
for A2, and thus A. Under this condition, the structure of the autocovari- 
ance for the strong factors, and the structure of the cross-autocovariance 
between the strong and weak factors, are not similar. 

Recall that Xj and Xj are the jth largest eigenvalue of, respectively, M 

defined in (2.6) and M defined in (2.7). We define matrices M* and M* in the 
same manner as M and M but with {yt} replaced by {yj? } defined in (4.2), 
and denote by A!- and A* the jth largest eigenvalue of, respectively, M* 

and M*. The following theorem shows the different behavior of the ratio of 
eigenvalues under the one-step and two-step estimation. Readers who are 
interested in the explicit rates for the eigenvalues are referred to Lemma 1 
in the Appendix. 

Theorem 4. Under the same conditions of Theorem 3, the following 
assertions hold: 

(i) For l<i<r 1 orr x <i<r, A i+ i/Aj >; 1. Fori < 1 < r 2 , X* j+1 /X* x 1. 

(ii) A r+ i/A r and A n +i/A n = o p (A, r+ i/A r ) provided 

5 2 > l/(8c - 1), p^^n- 1 ' 2 -> 0, p (6 C -l/2)5 2 -l/2 n -l/2 ^ ^ 

(iii) A r+ i/A r A and X* 2+l /X* 2 = o p (X r +i/%) provided p (4c-3/2)<5 2 -l/2 x 
n 1 / 2 — s> 00. 

Remark 4. (i) Theorem 4(i) and (ii) imply that the one-step estimation 
is likely to lead to r = r\. For instance, when pxn, then Theorem 4(h) says 
that A ri +i/A ri has a faster rate of convergence than A r +i/A r as long as 
82 > 2/5. Figure 3 shows exactly this situation. 

(ii) Theorem 4 (iii) implies that the two-step estimation is more capable to 
identify the additional r 2 factors than the one-step estimation. In particular, 
if p x n, A* 2+1 /A* 2 always has a faster rate of convergence than A r+ i/A r . Un- 
fortunately we are unable to establish the asymptotic properties for Aj+i/Aj 
for i> r, and A* +1 /A* for j > r 2 , though we believe that conjectures similar 
to (3.3) continue to hold. 

(iii) When 5±> and/or the cross-autocovariances between different fac- 
tors and the noise are stronger, the similar and more complex results can be 
established via more involved algebra in the proofs. 
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(b) 

Fig. 4. Plots of the estimated eigenvalues (a) and the ratios of estimated eigenvalues 
o/M (b) for Example 1. 



5. Real data examples. We illustrate our method using two real data 
sets. 

Example 1. We first analyze the daily returns of 123 stocks in the 
period 2 January 2002-11 July 2008. Those stocks were selected among those 
included in the S&P500 and were traded every day during the period. The 
returns were calculated in percentages based on the daily close prices. We 
have in total n = 1642 observations with p = 123. We apply the eigenanalysis 
to the matrix M defined in (2.7) with fco = 5. The obtained eigenvalues (in 
descending order) and their ratios are plotted in Figure 4. It is clear that the 
ratio-based estimator (2.8) leads to r = 2, indicating two factors. Varying the 
value of fcojaetween 1 and 100 in the definition of M leads to little change in 
the ratios Aj+i/Aj, and the estimate r = 2 remains unchanged. Figure 4(a) 
shows that Aj is close to for all % > 5. Figure 4(b) indicates that the ratio 
A«+i/Aj is close to 1 for all large i, which is in line with conjecture (3.3). 

The first two panels of Figure 5 display the time series plots of the two 
component series of the estimated factors defined as in (2.2). Their cross- 
autocorrelations are presented in Figure 6. Although each of the two esti- 
mated factors shows little significant autocorrelation, there are some signif- 
icant cross-correlations between the two series. The cross-autocorrelations 
of the three residual series ~fjYt for j = 3, 4, 5 are not significantly differ- 
ent from 0, where ■jj is the unit eigenvector of M corresponding to its jth 
largest eigenvalue. If there were any serial correlations left in the data after 
extracting the two estimated factors, those correlations are most likely to 
show up in those three residual series. 
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Fig. 5. The time series plots of the two estimated factors and the return series of the 
S&P500 index in the same time period. 

Figure 4 may suggest the existence of a third and weaker factor, though 
there are hardly any significant autocorrelations in the series 'j'^yt- In fact 
A3 = 6.231 and A4/A3 = 0.357. Note that now Xj is not necessarily a consis- 

tent estimator for A,- although A r +i/A r — > 0; see Theorem l(ii) and Corol- 
lary 1. To investigate this further, we apply the two-step estimation pro- 
cedure presented in Section 4. By subtracting the two estimated factors 
from the above, we obtain the new data [see (4.3)]. We then calculate 

the eigenvalues and their ratios of the matrix M*. The minimum value of 
the ratios is A^/A^ = 0.667, which is closely followed by A3/A2 = 0.679 and 
A4/A3 = 0.744. There is no evidence to suggest that A^/A^ -> 0; see Theo- 
rem 4. This reinforces our choice r = 2. 

With p as large as 123, it is difficult to gain insightful interpretation on the 
estimated factors by looking through the coefficients in A [see (2.2)]. To link 
our fitted factor model with some classical asset pricing theory in finance, 
we wonder if the market index (i.e., the S&P500 index) is a factor in our 
fitted model, or more precisely, if it can be written as a linear combination 
of the two estimated factors. When this is true, Pu = 0, where u is the 
1642 x 1 vector consisting of the returns of the S&P500 index over the 
same time period, and P denotes the projection matrix onto the orthogonal 
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Fig. 6. The cross-autocorrelations of the two estimated factors for Example 1. 



complement of the linear space spanned by the two component series xj, 
which is a 1640-dimensional subspace in i? 1642 . This S&P500 return series 
is plotted together with the two component series x$ in Figure 5. It turns 
out that ||Pu|| 2 is not exactly but ||Pu|| 2 /||u|| 2 = 0.023, that is, the 97.7% 
of the S&P500 returns can be expressed as a linear combination of the two 
estimated factors. Thus our analysis suggests the following model for yt — the 
daily returns of the 123 stocks: 

y t =a 1 u t + & 2 vt + e t , 

where ut denotes the return of the S&P500 on the day t, Vt is another factor, 
and St is a 123 x 1 vector white-noise process. 

Figure 5 shows that there is an early period with big sparks in the two es- 
timated factor processes. Those sparks occurred around 24 September 2002 
when the markets were highly volatile and the Dow Jones Industrial Aver- 
age had lost 27% of the value it held on 1 January 2001. However, those 
sparks are significantly less extreme in the returns of the S&P500 index; 
see the third panel in Figure 5. In fact the projected S&P500 return Pu is 
the linear combination of those two estimated factors with the coefficients 
(—0.0548,0.0808). Two observations may be drawn from the opposite signs 
of those two coefficients: (i) there is an indication that those two factors draw 
the energy from the markets with opposite directions, and (ii) the portfolio 
S&P500 index hedges the risks across different markets. 

Example 2. We analyze a set of monthly average sea surface air pres- 
sure records (in Pascal) from January 1958 to December 2001 (i.e., 528 
months in total) over a 10 x 44 grid in a range of 22.5°-110° longitude in the 
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Fig. 7. Plots o/Ai+i/A; — the ratio of the eigenvalues o/M (the top panel) and M* (the 
bottom panel), against i, for Example 2. 

North Atlantic Ocean. Let Pt(u,v) denote the air pressure in the ith month 
at the location (u, v), where u = 1, . . . , 10, v = 1, . . . , 44 and t = 1, . . . , 528. 
We first subtract each data point by the monthly mean over the 44 years at 
its location: ^ Ylt=i -fi2(i-i)+j( n > v )i where j = 1, . . . , 12, representing the 12 
different months over a year. We then line up the new data over 10 x 44 = 440 
grid points as a vector y t , so that yt is a p-variate time series with p = 440. 
We have n = 528 observations. 

To fit the factor model (2.1) to yt, we calculate the eigenvalues and the 
eigenvectors of the matrix M defined in (2.7) with ko = 5. Let Ai > A2 > • • • 
denote the eigenvalues of M. The ratios Aj+i/Aj are plotted against i in 
the top panel of Figure 7 which indicates the ratio-based estimate for the 
number of factor is r = 1; see (2.8). However, the second smallest ratio is 
A4/A3. This suggests that there may exist two weaker factors in addition; 
see Theorem 4(ii) and also Figure 3. We adopt the two-step estimation 
procedure presented in Section 4 to identify the factors of different strength. 
By removing the factor corresponding to the largest eigenvalue of M, the 
resulting "residuals" are denoted as y*[] see (4.2). Now we repeat the factor 
modeling for data y^ , and plot the ratios of eigenvalues of matrix M* in the 
second panel of Figure 7. It shows clearly the minimum value at 2, indicating 
further two (weaker) factors. Combining the above two steps together, we 
set r = 3 in the fitted model. We repeated the above calculation with ko = 1 
in (2.7). We still find three factors with the two-step procedure, and the 
estimated factors series are very similar to the case when ko = 5. This is 
consistent with the simulation results in [13], where they showed empirically 
that the estimated factor models are not sensitive to the choice of ko. 

We present the time series plots for the three estimated factors 5q = A'y^ 
in Figure 8, where A is a 440 x 3 matrix with the first column being the 
unit eigenvector of M corresponding to its largest eigenvalue, and the other 

two columns being the orthonormal eigenvectors of M* corresponding to its 
two largest eigenvalues; see (4.3) and also (2.2). They collectively account 
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Fig. 8. Time series plot of the three estimated factors for Example 2. 

for 85.3% of the total variation in y< which has 440 components. In fact 
each of the three factors accounts for, respectively, 57.5%, 18.2% and 9.7% 
of the total variation of yt. Figure 9 depicts the factor loading surfaces of 
the three factors. Some interesting regional patterns are observed from those 
plots. For example, the first factor is the main driving force for the dynamics 
in the north and especially the northeast. The second factor influences the 
dynamics in the east and the west in the opposite directions, and has little 
impact in the narrow void between them. The third factor impacts mainly 
the dynamics of the southeast region. We also notice that none of those 
factors can be seen as idiosyncratic components as each of them affects 
quite a large number of locations. 




10 20 30 




Fig. 9. Factor loading surface of the first, second and third factors (from left to right) 
for Example 2. 
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Fig. 10. Example 2: sample cross-correlation functions for the three estimated factors. 



Figure 10 presents the sample cross-correlations for the three estimated 
factors. It shows significant, though small, autocorrelations or cross-correla- 
tions at some nonzero lags. Figure 11 is the sample cross-correlations for 
three residuals series selected from three locations for which one is far apart 
from the other two spatially, showing little autocorrelations at nonzero lags. 
This indicates that our approach is capable to identify the factors based on 
serial correlations. 

Finally we note that the BIC method of [2] yields the estimate r = n = 528 
for this particular data set. We suspect that this may be due to the fact 
that [2] requires all the eigenvalues of S £ be uniformly bounded when p — > oo. 
This may not be the case for this particular data set, as the nearby locations 
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Fig. 11. Example 2: sample cross-correlation functions for three residual series. 50 rep- 
resents grid position (10, 5), 100 for (10, 10) and 400 for (10, 40). 



are strongly spatially correlated, which may lead to very large and also 
very small eigenvalues for S e . Indeed, for this data set, the three largest 
eigenvalues of Ti e are on the order of 10 6 , and the three smallest eigenvalues 
are practically 0. Since the typical magnitude of et is 10 2 from our analysis, 
we have done simulations (not shown here) showing that the typical largest 
eigenvalues for S e , if {et} is weakly correlated white noise, should be around 
10 4 to 10 5 , and the smallest around 10 2 to 10 3 when p = 440 and n = 528. 
Such a huge difference in the magnitude of the eigenvalues suggests strongly 
that the components of the white-noise vector et are strongly correlated. 
Our method does not require the uniform boundedness of the eigenvalues 
of S e . 
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APPENDIX 

Proof of Theorem 1. We present some notational definitions first. 
We denote by Xj,jj the jth largest eigenvalue of M and the corresponding 
orthonormal eigenvector, respectively. The corresponding population values 
are denoted by Xj and aj for the matrix M. Hence A = ( ; y 1 , . . . ,~j r ) and 
A = (ai, . . . , a r ). We also have 

Xj = a' j Ma j , X j =j j M : y j , j = l,...,p. 

We show some intermediate results now. With conditions (C3) and (C5) 
and the fact that {et} is white noise, we have 

\\± x (k) -V x {k)\\=0 P {p 1 - 5 n- 1 / 2 ), 

(A.l) 

\\H xe (k) ~ V xe (k)\\, \\X ex {k)\\ = P (p l ^l 2 n~ l l 2 ), 

where k = 0, 1, . . . , ko- Then following the proof of Theorem 1 of [13], we 
have the following for k = 1 , . . . , ko : 

\\M - M|| = P (\\V v (k)\\ ■ ||E„(fc) - E v (fc)||) 
where ||E„(fc)|| = 0(p 1_<5 ), 

( A -2) 

||E y (fc) ~ E„(fc)|| = Opip^U-^+p 1 ' 5 / 2 ^ 1 / 2 + ||E e (fc)||) 

= Op(p 1 ~ 5 ' 2 n- l / 2 + \\%{k)\\)- 

Now ||E e (fc)|| < ||E e (fc)|| F = P {pn- 1 / 2 ), where ||M|| F = trace (MM') de- 
notes the Frobenius norm of M. Hence from (A. 2), 

\\%(k) -E v (jfe)|| =0 P (pn~ 1/2 ) and 

(A.3) 

||M - M|| = P (p 1 " 5 • pn~ 1/2 ) = P (p 2 - 5 n~ l/2 ). 
For the main proof, consider for j = 1, . .. , r, the decomposition 
Xj - Xj = %Mrfj - a'jMaj = h + h + h + h + h 

where h = (% - Bj)'(M - M)% h = (jj - aj)'M(jj - aj), 

( AA ) _ 

I z = {% - SLjYMsLj , h = a!j (M - M)^- , 

/ 5 = a;.M(7 i -a i ). 

For j = 1,. . . , r, since — aj|| < ||A — A|| = Op(h n ) where h n = p <5 n~ 1 / 2 , 

and ||M|| < Ylk=i l|E y (A;)|| 2 = P (p 2 - 2S ) by (A.2), together with (A.3) we 
have that 

\\h\\, \\i 2 \\ = o P (p 2 - 2S h 2 n ), \\i 3 \\, \\h\\, \\i 5 \\ = o P (p 2 ~ 2S h n ), 

so that \Xj — Xj\ = Op(p 2 ~ 2S h n ) = Op(p 2 ~ s n~ 1 ^ 2 ), which proves Theorem l(i). 
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Now consider j = r + 1, . . . ,p. Define 

fco 

M = ^2± y (k)S y (k)', B = (7 P+1 ,...,%), B = (a r+1 ,...,a p ). 

fc=l 

Following the same proof of Theorem 1 of [13], we can actually show that 
||B - B || = P (h n ), so that - a,,-|| < ||B - B|| = P {h n ). 
Noting Xj = for j = r + 1, . . . ,p, consider the decomposition 

\ j =l' j NVy j =K 1 + K 2 + K 3 

where K x = 7'-(M — M — M' + M)7„-, 

(A.5) 

^ 2 = 2 7 ;(M-M)(7 J -a J ), 
^3 = (7 J -a i ) / M(7 J -a j ). 

Using (A. 3), 

fco fco 

fc = l fc = l 

Similarly, using (A. 2) and (A. 3), and ||B — B|| = P (h n ), we can show that 
| if 2 1 = O p (\\M - M|| • ||7^ - aj-ll) = O p (||M - M|| • ||B - B||) = Op^n" 1 ), 
\K 3 \ = P (\\B - B|| 2 • ||M||) = P {p 2 - 25 h 2 n ) = Optfn- 1 ). 
Hence Xj = Op(p 2 n~ 1 ), and the proof of the theorem is completed. □ 

Proof of Corollary 1. The proof of Theorem 1 of [13] has shown 
that (in the notation of this paper) 

p 2 ~ 25 = 0(X r ). 

But we also have 



A r < Ai = ||M|| <^2\\X y (k)\\ 2 = 0(p 



2-28\ 

fc=l 

l-S\ 



where the last equality sign follows from ||S y (A;)|| = 0{p l ~ b ) in (A.2). Hence 
we have Aj X p 2 ~ 2S for i = 1, . . . , r. 

Letting = |Aj — Aj| for i = 1, . . . , r, we then have = Op(p 2 ~ s n~ 1 ^ 2 ) 
from Theorem l(i). But since h n = p^n~ 1 / 2 = o(l) implying that p 2 ~^n~^l 2 = 
p 2 ~ 2S h n = o(p 2 ~ 2S ), we have = op(Aj). Hence we must have A; x Aj x 
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p2-2& fox i = l,. . . ,r. This implies that X 1 for j = 1, . . . ,r — 1, and 

together with Theorem l(ii), 

A r+1 /A r = P (p 2 n- l /p 2 - 25 ) = Op(p 2& n~ l ) = P (h 2 n ). 
This completes the proof of the corollary. □ 

In the following, we use Oj (M) to denote the jth largest singular value of 
a matrix M, so that o"i(M) = ||M||. We use Ay(M) to denote the jth largest 
eigenvalue of M. 

Proof of Theorem 2. The first part of the theorem is actually Theo- 
rem 2 of [13]. We prove the other parts of the theorem. From equation (22) 
of [13], the sample lag-/c autocovariance matrix for St satisfies 

(A.6) ||S e (fc)|| = Op(pn~ 1 ). 

Note that (A. 2) together with (A.6) implies 

||M - M|| = Opip^ip 1 -^ 2 ^ 1 / 2 + pn~ 1 )) 

= Op{p 2 - 2 \p"l 2 n'l 2 +/n" 1 )) = P (p 2 - 2S i n ), 

since £ n = p 5 l 2 n~ l l 2 = o(l). We also have ||B — B|| = P (£ n ), similar to the 
proof of Theorem 1. 

With these, for j = 1, . . . ,r, using decomposition (A. 4), we have 

\Xj -Xj\ = P (\\M - M||) = o P (p 2 ~ 25 e n ) = Opip 2 - 35 / 2 ^ 1 / 2 ), 

which is Theorem 2(i). For j = r+1, . . . , (ko + l)r, using decomposition (A. 5), 
we have 

Kl = P ((p 1 - 5 / 2 n- 1 / 2 + pn- 1 ) 2 ) = P {p 2 - s n- 1 + p 2 n~ 2 ) = Opip^n' 1 ), 
\K 2 \ = P (\\M - M|| • ||B - B||) = P (p 2 - 2S £ 2 n ) = Opip 2 - 5 ^ 1 ), 
\K 3 \ = P (\\B - B|| 2 • ||M||) = P (p 2 ~ 25 £ 2 n ) = P (p 2 ~ s n^). 

Hence \j = P (p 2 ~ 2& l 2 n ) = P {p 2 ~ & n~ l ), which is Theorem 2(h). 
For part (hi), we dehne 

Wyfo) = (S„(l), • • • , S,(fco)), Wy(k0) = (S,(l), . . . , %(k Q )), 

so that M = W y (k )Wy(k y and M = W y (k )W y (k y . We dehne similarly 
W x (ko), W I£ (fco)) ^W ex (ko) and W e (fco)- Then we can write 

W y (k ) = M 1 + M 2 + W e (k ), 
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where M x = A(W x (k )(I ko ® A') + W xe (*t,)), M 2 = W ex (k )(I h) <g> A'). It 
is easy to see that 

rank(Mi) < r, rank(M2) < kor, 
so that rank(Mi + M2) < (fco + l) r - This implies that 

o-j(Mi + M 2 ) = for j = (Ao + l)r + 1, . . . ,p. 
Then by Theorem 3.3.16(a) of [11], for j = (ko + l)r + 1, . . . ,p, 

Aj = Aj(M) = oJ(W tf (fco)) < (^(Mi + M 2 ) + (n(W e (A;o))) 2 

= of (W 6 (fco)) < X; ||£ e (A;)|| 2 = Op(p 2 n~ 2 ), 
fc=i 

where the last equality sign follows from (A. 6). This proves Theorem 2(iii). 

We prove Theorem 2(h)' now. Using Lemma 3 of [13], with the same 
technique as in the proof of Theorem 1 in their paper, we can write 

(A.7) B = (B + AP)(I + P'P)~ 1/2 with ||P|| =0 P (£ n ). 

With the definition of B as in the proof of Theorem 1, we can write 
A r+ i, the (r + l)th largest eigenvalue of M, as the (1,1) element of the 
diagonal matrix D = B'MB, where MB = BD. But from (A.7), we also 
have B'B = B'(B + AP)(I + P'P)" 1 ^ = (I + P'P)- 1 ^ hence 

(I + P'PJ^B'MB = (I + P'P) 1/2 B'BD = (I + P'P) 1 / 2 (I + P / P)- 1 / 2 D = D. 
Further, by using Neumann series expansions of (I + P'P) 1 /2 and (j + 
p'p)-i/ 2 ) we see that the largest order term of (I + P'P) 1 / 2 B / MB is con- 
tributed from B'M(B + AP), since from (A.7) we have ||P|| = P (£ n ) = 
op(l). Hence the rate of A r +i can be analyzed using the (1,1) element of 
B'M(B + AP). 

Some notation first. Define the column vector of k ones, and 

E rs = (s r , • • • , ^s), -X-r,s = (^r, ■ ■ ■ , X s ) for T S. 

Since k is finite and {ej} and {xj} are stationary, for convenience in this 
proof we take the sample lag- A; autocovariance matrix for {et}, {xt} and the 
cross lag-/c autocovariance matrix between {et} and {x^} to be respectively, 
for k > 0, 

S e (fc) = ?i _1 (E fc+ i in - (n - /c) _1 E fc+ i .„l n _ fc l^_ fc ) 
x (E lin _ fc - (n - fc) -1 Ei in _fcl n _fclJ,_ fc )' 
= n~ 1 E fc+ i jn T n _ fc E / 1 n _ k , 
^x{k) = n 1 Xfc + i i „T n _fcX / 1 
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and 

^xe(k)=n 1 X fc+ljn T ?t _ fc E / lri _ fc , 
where Tj = Ij - Then 

k ko 

B'M(B + AP) = B%(k)± y (k)'(B + AP) = ^F fc (F' fc + G fc ), 

k=l k=l 

where 

Ffc = n~ 1 B / E fc+ljn T n _ fc X' ln _ A .A / + n~ 1 B'E fc+lin T n „ jt E / ln _ A ,, 

Gfc = n 1 AXi in _fcT n _fcX fe+ln P' + n Ei )n _fcT n _fcXj. +lin P' 

+ n^AXi^.iVfcE'^^AP' + n^ELn-fcTn-fcEj^AP'. 
Some tedious algebra (omitted here) shows that the dominating term of 
the above product is YlkLi ^~ 2 B'Efc + i >n T n _fcX' 1 n „ fe Xi ,n— fcTji— fcX fc+1 n P . 
Defining c' lk = (a£, +1 £fc +1 , . . . , a' r+1 £ n ) and pi the first column of P', the 
(1,1) element of the said term is then 

k 

^ n 2c l,fc T n-fcX / l n _fcXi in _fcT n _fcX fe+1 n pi 

fe=l 

ko 

< ^ ri ~ 2 || c 'i,fcll II Pi II ||T„_fc|| 2 ||Xi in _fc|| 2 ||Xfc + i jn || 
fe=i 

ko 

<4^||n- 1 / 2 c 1 ,fc||||P||||7i- 1 / 2 X lin _fc|| 2 ||n- 1 / 2 X fc+lin || 
k=l 

= Op(||n- 1 /2 Clil ||.^. p (3-3 5 )/2 ) _ 

In the last line we used ||ra _1 / 2 Xi n _fc|| = Op(p( 1 ~ <5 '/ 2 ), by noting that 
\\n 1 / 2 X 1>n _fc|| 2 = ||n 1 X 1>n _fcX / lin _ fc || x \\n 1 X ljn _fcT n _fcX / lin _ fc || 

= ||2,(0)|| < ||2,(0) - 2,(0)11 + 112,(0)11 

= Opip^n-l/l) + Opip 1 " 5 ) = Oip 1 " 6 ), 

where ||2,(0) - 2,(0)11 = P {p l ~ 5 n-l/2) is from (A.l) and 112,(0)1) = 
0(p 1 ~ s ) is assumed in condition (C5). With condition (C9), we can show 
that ||n _1 / 2 ci > i|| =Op(l), since 



P(||n- 1 /2 Clil ||>x) = P In- 1 a! r+1 e j e? j a r+1 > 

V j=k+i 



.r 2 



< (n - k)a! r+l Yl e a. r+1 /{nx 2 ) < cr^Jx 2 , 
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(A.8) A 



3 



where we used the Markov inequality with o";^ ax the maximum eigenvalue 
of X! £ , and the fact that a^, +1 a r+ i = 1. Hence the (1, 1) element of B'M(B + 
AP) has rate Op(f/ 3 ~ 3<5 )/ 2 4) = P (p''/ 2 ~ s n~ 1 / 2 ), which is also the rate of Xj 
for j > r + 1. This completes the proof of the theorem. □ 

We outline the proofs of Theorems 3 and 4 below. Detailed proofs can be 
found in the supplemental article (Lam and Yao [12]). 

Outline proof of Theorem 3. First, under model (4.1) and M de- 
fined in (2.6), with conditions (C1)-(C4), (C5)', (C6)', we can show that the 
rates of the eigenvalues of M are given by 

p 2 , for j = l,...,n; 

p 2 ~ 2& \ if IIWmIU^oCp 1 "*) (c=1) 

for j = r\ + 1, . . . , r; 
p 2-2c8 2 ; jf || Wai || min x pi-cfc , 1/2 < c < 1, and 

||WiW^|| < gllWilln^HWaiH.O < q < 1, 
for j = n + 1, . . . ,r. 

For model (4.1), and M* defined in Section 4 by y* in (4.2), we have 
(A.9) A*x/- 2 ^ for j = l,...,r 2 . 

We cannot use Lemma 3 of [13] to prove this theorem for the one-step 
estimation, since the condition ||E|| < sep(Di,D2)/5 gives a restrictive con- 
dition on the growth rate of p, and also restricts the range of 82 allowed. 
Instead, we use Theorem 4.1 of [24]. 

Write M = X^-D^X^ for i^j = 1,2, where X^ = (AjAjB), B is the 
orthogonal complement of A = (A1A2), and Dm is diagonal with Dj, = 
diag(Dj, Dj, 0) where Di contains Xj for j = l,...,r\ and D2 contains Xj 
for j = n + 1, . . . , r. With E = M - M, define 

X'EX = (Eij) for 1 < i,j < 3, 

where Ey = A'-EAj if we denote B = A3. 

Define sep (Mi, M2) = rnin AgA ( Ml ) i/jeA ( M2 ) |A — If we can show that 

||(E ii ,E a )||=op(7o-) 

(A.10) 

D i + E ii E j3 

E3j E33 



with jij = sep Dj + E^, 



then condition (4.2) in [24] is satisfied asymptotically, so that we can use 
their Theorem 4.1 to conclude that for i^j = 1,2, 

(A.ll) ||Ai - A»|| =Op(||(Ey,Ei 3 )||/7ij). 
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Since we can show that ||Ei2|| = Op(||Ei3||) = Op{p 2 n~ 1 / 2 ), we have || (E12 , 
E13) || = Op{p 2 n~ 1 / 2 ). We can also show that 712 x p 2 using (A. 8). Hence 
(A. 10) is satisfied, and (A. 11) implies that 

|| Ai - Ai|| = Op{p 2 rr x l 2 lp 2 ) = P (n- x l 2 ). 

Also, we can show that HE23 1| = Op (E21) = Op(p 2 ~ S2 ^ 2 n~ 1 / 2 ), implying that 
||(E 2 i, E 23 )|| =0 P {p 2 ~^/ 2 n~ 1 / 2 ). We can also show that 721 xp 2 " 2 * us- 
ing (A. 8), provided p c52 n -1 ' 2 — > 0. Hence (A. 10) is satisfied since we assumed 
v n — > 0, and so (A. 11) implies that 

||A 2 - A 2 || = Op^-^n- 1 ^//- 2 ^) = Op^-^n- 1 ' 2 ) = P (u n ), 

which completes the proof for the one-step estimation. 

For the two-step estimation, write M* = (A 2 B*)D*(A 2 B*)', where B* is 
the orthogonal complement of A 2 , and D* is diagonal with D* = diag(D?;, 0). 
The matrix D£ contains A* for j = 1, . . . , r 2 , so that (A. 9) implies sep(D2, 0) x 
p 2 ~ 2 ^. _ 

We can show that ||E*|| = ||M* - M*|| = P {p 2 ~ 2&2 n n ), hence ||E*|| = 
op(sep(D2, 0)), as n n — > 0. Hence we can use Lemma 3 of [13] to conclude 
that 

||A 2 -A 2 ||=Op(||E; i ||/sep(D;,0)). 
Since we can show that HE^I = Op(p 2 ~ 3S2 ^ 2 n~ 1 ^ 2 ), we then have 
|| A 2 - A 2 || = P (p 2 -^/ 2 n- l l 2 /p 2 - 252 ) = Opip^n-V 2 ), 
which completes the proof of the theorem. □ 

To prove Theorem 4, we need two lemmas first. 

Lemma 1. Under the same conditions and notations of Theorem 3, the 
following assertions hold: 

(i) For j = l,...,n, \\j -Xj\ = P (j> 2 n- x l 2 ). 

(ii) For j = r\ + 1, . . . , r, \Xj — Xj\ = Op(p 2 (n~ 1 / 2 + v^)) provided v n — > 0, 

p cS 2n -l/2 0. 

(iii) For j = r + 1, . . . ,p, Xj = Op(p 2 u 2 ), provided v n — > 0, p c5 2 n -i/ 2 q 

(iv) For j = 1 , . . . , r 2 , | A* - A* | = P {p 2 ~ 252 K n ) . 

(v) For j = r 2 + 1, . . . ,p, X* = O p{p 2 ~ 2&2 K 2 n ) . 

(vi) For j = (ko + l)r + 1, . . . ,p, Xj , A* = P (p 2 n - 2 ) = Op{p 2 ~ 2b2 <) . 
(iii)' If in addition condition (C9) holds, then for j = r + 1, . . . ,p, Xj = 

Op(p 3 / 2 v n ), provided v n — > 0, p c&2 n~ l l 2 — >■ 0. 

The proof of this lemma is left in the supplementary materials for this 
paper. Together with (A. 8) and (A. 9), we have the following lemma. 
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Lemma 2. Let conditions (C1)-(C4), (C5)', (C6)', (C7) and (C8) hold. 
Then as n,p — > oo with n = 0(p), and with v n , K n — > the same as in The- 
orem 3 and p c ^ n - 1 / 2 — y 0, we have 

<1, j = l,...,n-l; 

= 0p (n- 1 /2 + z/ 2 +p -25 2)) j = ri; i/||W 21 || min = (p 1 -^) 



Xj+x/Xj < 



(c = l); 

O p ( n -l/2 + z/ 2 +p -2c5 2)) i = ri; J /||W 21 || mill Xp 1 -* 



/or 1/2 < c < 1, and 
llWxW^H^gllWilUnllWsill 
for < g < 1 . 

Furthermore, if || W 2 i || m in = o(p 1_<52 ) and p 5<52 / 2 n -1 / 2 — )• 0, we have 

{xl, j = n + l,...,r-l; 

= P (p^ul), j=r; 
= Op(p 25 ' 2 ^ 1 ^ 2 u n ), j = r, and condition (C9) /toWs. 

//IIWsil^xp 1 -* /or 1/2<c<1, ||WiW 21 ||<g||Wi|| min ||W 21 || /or 
< q < 1, and p( 3c - 1 /2)5 2n -i/2 Qj we ft aue 

{xl, j =rx + l,...,r-l; 

= Op(p 2c5 *v*), j = r; 

= Op(p 2cS2 l / 2 v n ), j = r, and condition (C9) holds. 

For the two-step procedure, let conditions (C1)-(C4), (C5)', (C6)', (C7) 
and (C8) hold and n = 0(p). Then we have 



A* +1 /A* 



1, j = !,■•■, r 2 -l; 

Op(n 2 n ), j = r 2 . 



Proof. We only need to find the asymptotic rate for each Xj and X*j. 
The rate of each ratio can then be obtained from the results of Lemma 1 . 
For j = 1, . .. ,n, from Lemma 1, ||Aj — Aj|| = Op(p 2 n~ 1 ^ 2 ) = op(Xj), and 

hence Xj x Xj xjj 2 from (A. 8). 

Consider the case ||W2i|| m i n "p 1- *. For j = rx + l,...,r, since |Aj — 
\ j \=Op(p 2 (n- 1 / 2 + v 2 n )), we have Xj < Xj + P {p 2 {n~ 1 / 2 + z/ 2 )) = P {p 2 ~ 2c5 * + 
p 2 u 2 +p 2 n -1 / 2 ), and hence 

A ri+1 /A ri = P {{p 2 - 2 ^ +p 2 v 2 n +p 2 n^ 2 )/p 2 ) = P (n^ 2 + u 2 n +p~ 2 ^). 

The other case is proved similarly. 

For j = rx + 1, • • • , r, to make sure Xj will not be zero or close to zero, we 
need 

\Xj -Xj\ = P {p 2 {n- 1 ' 2 + p 2 n )) = opiXj), 
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where Xj xp 2 " 2 * as in (A. 8). Hence we need p 2 (n~ 1 / 2 + v 2 ) = o(p 2 ~ 2c52 ), 
which is equivalent to the condition p( 3c ~V 2 ) <5 2 ra -l/ 2 _^ q_ this condition 

satisfied, then Xj x A,,- x p 2 -2c<5 2 for j = n + 1, . . . , r. Since Aj = Op(p 2 i' 2 ) 
for j = r + 1, . . . ,p, we then have 

A r+1 /A r = P {p 2 v 2 Jp 2 - 2 ^) = P (p 2cS ^ 2 ). 

All other rates can be proved similarly, and thus are omitted. □ 

Proof of Theorem 4. With Lemma 2, Theorem 4(i) is obvious. For 
Theorem 4(h), note that the range of 62 and the rates given in the the- 
orem ensure that n~ x l 2 + u 2 + p~ 2c ^ = o{p 2c52 ~ l / 2 v n ) = o(p 2cS2 u 2 ). Hence 
Lemma 2 implies a better rate of convergence for X ri +i/X ri no matter 
whether condition (C9) holds or not. We can use a similar argument to 
prove part (iii), and details are thus omitted. □ 
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SUPPLEMENTARY MATERIAL 

Detailed proofs of Theorems 3 and 4 (DOI: 10.1214/12-AOS970SUPP; 
.pdf). The document contains detailed proofs of Theorem 3 and 4 in the 
paper. 
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